Bayesian Models for Re-identification of Trucks over Long Distances Based on Axle Measurement Data
نویسندگان
چکیده
1 Vehicle re-identification methods can be used to anonymously match vehicles crossing two 2 different locations based on vehicle attribute data. Most of the existing work in this area focuses 3 on matching vehicles between two locations that are separated by less than a mile or so. In this 4 paper, re-identification methods are developed to match commercial vehicles that cross two 5 weigh-in-motion sites in Oregon that are separated by 145 miles. Using vehicle length and axle 6 data as attributes to characterize vehicles, a Bayesian model for vehicle re-identification is 7 developed that uses probability density functions obtained by fitting statistical mixture models to 8 a sample dataset of matched vehicles. This model when applied to a test dataset can match 9 vehicles with an accuracy of 91% when both axle weight and axle spacings data are used. An 10 additional new model is developed to screen mismatched vehicles produced by the re11 identification algorithm. This screening model allows the user to trade-off the total number of 12 matched vehicles and error rate. It is shown that the mismatch error can be reduced to as low as 13 1% (i.e., 99% accurate matching) at the expense of not matching about 25% of the vehicles. 14 Overall, for travel time estimation purposes, the methods presented in this paper can be used 15 effectively to match commercial vehicles crossing two data collection sites that are separated by 16 long distances. 17 18 Cetin, Monsere, and Nichols 2 INTRODUCTION 1 Most transportation agencies rely on point detectors (e.g., inductive loops, axle detectors) located 2 at specific points on highways to collect data on traffic volumes, vehicle classes, and other 3 relevant attributes of traffic. By utilizing the data collected from these point detectors, 4 researchers have developed vehicle re-identification algorithms to match measurements at two 5 sites that belong to the same vehicle. This enables tracking the movement of individual vehicles 6 between different data collections sites which in turn provides valuable information for the 7 estimation of travel times, travel delays, and origin-destination flows. 8 Even though there are other technologies that can be utilized to track the movements of 9 vehicles over transportation networks, most of these technologies (e.g., automatic vehicle 10 identification (AVI) tags, license plate recognition) require installation of additional in-car and/or 11 roadside devices and may have related privacy concerns. However, vehicle re-identification 12 methods that are based on the vehicle attribute data collected by sensors already installed on 13 roadways enable tracking vehicles anonymously and do not require substantial additional 14 investment. There have been several studies on re-identifying individual vehicles at multiple 15 locations by utilizing data from existing inductive dual loop detectors (Sun et al. 1999; Coifman 16 and Cassidy 2002; Coifman 2003). While most of the previous studies are based on data from 17 dual loops, some researchers also extended the application of the re-identification algorithms to 18 data from single loops (Coifman and Krishnamurthy 2007). The predominant application of 19 these methods has been to estimate travel times to characterize link performance (Liu et al. 2002; 20 Oh et al. 2005; Sun et al. 2003). 21 Vehicle re-identification methods rely on the variability within the vehicle population and 22 the ability to accurately identify the pairs of measurements collected at upstream and 23 downstream stations that are generated by the same vehicle. These measurements can either be 24 the actual physical attributes of vehicles such as length (Coifman and Cassidy 2002) and axle 25 spacing (Cetin and Nichols 2009) or some characteristics of the sensor waveform or inductive 26 vehicle signature (Sun et al. 1999). Researchers have developed various methods, such as 27 lexicographic optimization (Sun et al. 1999; Oh et al. 2007), decision trees (Tawfik et al. 2004), 28 etc, to re-identify vehicles. In a typical implementation of these methods, a downstream vehicle 29 is matched to the most “similar” upstream vehicle (or vice versa) based on some defined metric 30 (e.g., Euclidian distance). The resulting accuracy of these methods depends on several factors 31 including the variation of the attribute data from vehicle to vehicle, number of attributes, the 32 distance between data collection stations, variability of travel time, and type of the re33 identification algorithm used. Given a particular set of factors, this accuracy may or may not be 34 satisfactory for a given application. It would be desirable to have a model to “adjust” the level of 35 accuracy by perhaps being more judicious in matching vehicles. In other words, the model 36 should match a (select) set of vehicles rather than all vehicles such that the accuracy is 37 maintained at an acceptable level. This paper presents a new approach on how this can be done 38 effectively. 39 Cetin, Monsere, and Nichols 3 The vehicle re-identification method proposed in this paper consists of two main stages. 1 In the first stage, each vehicle from the downstream station is matched to the most “similar” 2 upstream vehicle as is typically done in vehicle re-identification methods. Both a Euclidian 3 distance method and a Bayesian method are utilized to solve the first stage problem. In the 4 second stage, a new method is proposed to tradeoff accuracy versus the total number of vehicles 5 being matched. This method involves calculating both the highest and the second highest 6 similarity measures for each vehicle being matched. Several criteria are suggested and evaluated 7 for screening mismatched vehicles of the first stage based on these similarity measures. As 8 demonstrated in the paper, the proposed screening approach improves the accuracy of the re9 identification methods significantly. The models are applied to the truck data collected by weigh10 in-motion (WIM) and automatic vehicle classification (AVC) sensors at two stations in Oregon 11 separated by 145 miles. 12 Overall this paper contributes to the re-identification literature in two fundamental ways: 13 (i) a screening approach is developed to reduce the matching accuracy to a desired level; and (ii) 14 the re-identification models are applied to the data collected from two stations that are separated 15 by 145 miles, which seems to be the first attempt to match vehicles over such large distances. 16 Furthermore, archived vehicle data are used here as opposed to special vehicle attribute collected 17 by sensors. 18 DATASET FOR MODEL DEVELOPMENT AND TESTING 19 The existing intelligent transportation systems (ITS) infrastructure for motor carrier weight and 20 safety enforcement in Oregon Green Light provides the data for the analysis in this paper. The 21 Oregon Department of Transportation (ODOT) Motor Carrier Division has equipped 22 fixed 22 weigh stations with AVI antennas, WIM scales, and over-height detection equipment. The Green 23 Light program allows motor carriers equipped with transponders, not overweight, and with valid 24 credentials to bypass the weigh stations. Participation in the Green Light program is substantial; 25 on average about 40% of observed vehicles are equipped with transponders (though this varies 26 from station-to-station). The unique aspect of Oregon’s system is that this transponder and 27 weight-related data are available together in one record. These transponder-equipped vehicles 28 provide a large pool of data to develop, validate, and test the vehicle re-identification techniques 29 proposed within. Data from these stations are received monthly via FTP transfer from ODOT to 30 Portland State University Each observation is loaded in the WIM data archive housed within the 31 Portland Transportation Archive Listing (PORTAL) umbrella (Bertini et al. 2005). 32 To conduct the analysis described in the following sections of this paper, a subset of the 33 database was prepared. The data stored in the archive have not been processed for data quality 34 and measurement errors that are known to exist, thus the objective of the subset procedure was to 35 identify a month of transponder-matched vehicles from a pair of stations with a minimum of 36 sensor error. The subset was developed by analyzing matched vehicles (based on AVI 37 transponder numbers) that are classified as a five-axle semi-truck (FHWA Class 9) between 38 station pairs for 2007. For all station-pairs, density plots of the ratio of the upstream 39 Cetin, Monsere, and Nichols 4 measurements to the downstream measurements for four different metrics were created: total 1 truck length, distance in feet between axles 2 and 3 (the tandem drive axles), the total number of 2 axles, and axle 1 (steering) weight. Previous work (Dahlin 1992; Nichols and Cetin 2007) has 3 shown that the weight of the steering axle is fairly constant for any truck loading condition and 4 the other measures are physical parameters of the truck that should not change. If the upstream 5 and downstream sensors are calibrated in exactly the same way, a density plot of the ratio should 6 be tightly distributed around 1.0. Inspection of the density plots found the station-pairs with the 7 least error was between the Klamath Falls and Lowell stations. Trucks traveling between these 8 stations most likely traverse US-97 from just north of the California border north to the junction 9 with OR-58, where it heads northwest over the Oregon Cascade mountains (there are no feasible 10 alternate routes). This route is mostly a 2-lane primary rural highway. To further narrow the 11 subset to one month in 2007, the above metrics and additional variables of interest for the re12 identification algorithm (lengths between each axle pair, and the weights for each axle) were 13 considered. Upon inspection, there did not appear to be much month-to-month variation for this 14 station pair, however, October 2007 seemed to show the most consistent agreement between the 15 upstream and downstream detectors. As such, the records collected in October 2007 at these two 16 stations were used for the remainder of the analysis. 17 Table 1 shows the number of upstream and downstream vehicles observed at these two 18 stations for October 2007. For model training and testing, vehicles that cross both stations need 19 to be identified. Based on transponder numbers and time stamps, approximately 3,100 common 20 truck trips (includes all truck classes available in the dataset) are identified for this purpose, of 21 which the first 2,100 (data sorted by time stamps) are used for training the models. For testing 22 the models, the remaining 1,000 vehicles are used. As explained in the next section, vehicles at 23 the downstream stations are being matched to the vehicles in the upstream. Therefore, every one 24 of the 1,000 vehicles in the downstream will be matched to a vehicle in the upstream. However, 25 in this research, the upstream data not only contains the 1,000 common vehicles but also other 26 vehicles that cross the station around the same time periods. This resulted in selecting a total of 27 10,581 vehicles in the upstream, some of which do not have transponders. 28 Table 1 Number of Trucks Observed at the Upstream and Downstream Stations, October 2007 29 Trucks Upstream (KFP) Downstream (LWL) With Transponders 9,079 3,981 Without Transponders 17,496 11,977
منابع مشابه
Bayesian Sample Size Determination for Joint Modeling of Longitudinal Measurements and Survival Data
A longitudinal study refers to collection of a response variable and possibly some explanatory variables at multiple follow-up times. In many clinical studies with longitudinal measurements, the response variable, for each patient is collected as long as an event of interest, which considered as clinical end point, occurs. Joint modeling of continuous longitudinal measurements and survival time...
متن کاملSpatial count models on the number of unhealthy days in Tehran
Spatial count data is usually found in most sciences such as environmental science, meteorology, geology and medicine. Spatial generalized linear models based on poisson (poisson-lognormal spatial model) and binomial (binomial-logitnormal spatial model) distributions are often used to analyze discrete count data in which spatial correlation is observed. The likelihood function of these models i...
متن کاملA Validation Test Naive Bayesian Classification Algorithm and Probit Regression as Prediction Models for Managerial Overconfidence in Iran's Capital Market
Corporate directors are influenced by overconfidence, which is one of the personality traits of individuals; it may take irrational decisions that will have a significant impact on the company's performance in the long run. The purpose of this paper is to validate and compare the Naive Bayesian Classification algorithm and probit regression in the prediction of Management's overconfident at pre...
متن کاملEvaluation of Existing Bridges by Field Testing
The objective of the presented research is to develop an efficient control system for highway load effects that involves the control of various parameters including: truck weight (axle weights and axle spacing), truck load distribution on bridge girders, dynamic load, strain, stress and deflection of bridge components, and verification procedure for the minimum load carrying capacity. This stud...
متن کاملIterated local search for the capacitated vehicle routing problem with sequence-based pallet loading and axle weight constraints
The capacitated vehicle routing problem with sequence-based pallet loading and axle weight constraints is an extension of the classical Capacitated Vehicle Routing Problem (CVRP). It integrates loading constraints in a routing problem and is based on a real-world transportation problem. The demand of the customers consists of pallets. These pallets may be placed in two horizontal rows inside th...
متن کامل